Reverberation robust speech recognition by matching distributions of spectrally and temporally decorrelated features
نویسندگان
چکیده
This paper addresses dereverberation of speech using an unsupervised approach utilizing speech prior and taking only weak assumptions on reverberation. Our approach uses a long time context representation of reverberated speech in spectral-temporal supervectors which are decorrelated by the PCA. In the decorrelated domain supervectors are mapped from reverberant speech distribution to clean speech distribution and then to mel-spectral vectors. Mel-domain Wiener filter is applied as post processing. Our results demonstrate performance gains over the provided baseline recognizer, and show that the method can be coupled to CMLLR adaptation with cumulative benefits for clean trained models. Furthermore, we show that using dimensionality reduction coupled with the Wiener filter is better than using full dimensional PCA in representing small variance components in speech.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملRobust Features and System Fusion for Reverberation-robust Speech Recognition
Reverberation in speech degrades the performance of speech recognition systems, leading to higher word error rates. Human listeners can often ignore reverberation, indicating that the auditory system somehow compensates for reverberation degradations. In this work, we present robust acoustic features motivated by the knowledge gained from human speech perception and production, and we demonstra...
متن کاملFeature enhancement of reverberant speech by distribution matching and non-negative matrix factorization
This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects ...
متن کاملA Missing Data Approach for Robust Automatic Speech Recognition in the Presence of Reverberation
We describe a technique for robust recognition of reverberated speech using the ‘missing data’ paradigm. Modulation filtering is used to identify time-frequency regions of the speech signal which are relatively uncontaminated by reverberation and contain strong speech energy; only these ‘reliable’ acoustic features are made directly available to the recogniser. The proposed system is evaluated ...
متن کاملCombating reverberation in large vocabulary continuous speech recognition
Reverberation leads to high word error rates (WERs) for automatic speech recognition (ASR) systems. This work presents robust acoustic features motivated by subspace modeling and human speech perception for use in large vocabulary continuous speech recognition (LVCSR). We explore different acoustic modeling strategies and language modeling techniques, and demonstrate that robust features with a...
متن کامل